Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support async ChannelMonitorUpdate persist for claims against closed channels #3414

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

TheBlueMatt
Copy link
Collaborator

@TheBlueMatt TheBlueMatt commented Nov 18, 2024

Based on #3413, this finally implements async ChannelMonitorUpdate persistence against closed channels. After all the prep, its...not all that bad. Would still like to write an additional test for the last second-to-commit though the changes in that commit do show it has some coverage.

@TheBlueMatt TheBlueMatt added this to the 0.1 milestone Nov 18, 2024
Copy link

codecov bot commented Nov 18, 2024

Codecov Report

Attention: Patch coverage is 88.25911% with 29 lines in your changes missing coverage. Please review.

Project coverage is 89.73%. Comparing base (1a8bf62) to head (5d5971c).

Files with missing lines Patch % Lines
lightning/src/ln/channelmanager.rs 80.57% 17 Missing and 10 partials ⚠️
lightning/src/ln/chanmon_update_fail_tests.rs 98.13% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3414   +/-   ##
=======================================
  Coverage   89.73%   89.73%           
=======================================
  Files         130      130           
  Lines      107793   107860   +67     
  Branches   107793   107860   +67     
=======================================
+ Hits        96727    96793   +66     
+ Misses       8662     8656    -6     
- Partials     2404     2411    +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheBlueMatt TheBlueMatt force-pushed the 2024-09-async-persist-claiming-from-closed-chan branch from 8fca379 to fb56bd3 Compare November 24, 2024 15:05
@TheBlueMatt TheBlueMatt force-pushed the 2024-09-async-persist-claiming-from-closed-chan branch 2 times, most recently from d581b8d to 76b3a99 Compare December 6, 2024 19:18
@TheBlueMatt
Copy link
Collaborator Author

TheBlueMatt commented Dec 6, 2024

Rebased on #3413 after merge. I promise this one isn't quite as bad @jkczyz / @valentinewallace.

Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few things that jumped out at me. Will need to take another pass at this.

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
Comment on lines -13235 to +13190
let latest_update_id = monitor.get_latest_update_id();
update.update_id = latest_update_id.saturating_add(1);
let latest_update_id = monitor.get_latest_update_id().saturating_add(1);
update.update_id = latest_update_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this behavior wrong previously? (i.e., using the previous latest_update_id below)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantics have changed - because we're setting the value at update creation we need the map to contain the value after this update, rather than the value before this update (which was used to set the update's value when its applied).

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take another pass as well but overall looks good

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
// If the A<->B channel was closed before we reload, we'll replay the claim against it on
// reload, causing the `PaymentForwarded` event to get replayed.
let evs = nodes[1].node.get_and_clear_pending_events();
assert_eq!(evs.len(), if close_chans_before_reload { 2 } else { 1 });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my understanding, does this PR make the duplicate-PaymentForwarded issue worse or is it the same as before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhhh, if you're using async monitor updating maybe? It shouldn't change anything for non-async monitor update users, but of course for async monitor update users it should only change things (incl a new spurious event) in cases where what we did before was just unsafe.

@TheBlueMatt TheBlueMatt force-pushed the 2024-09-async-persist-claiming-from-closed-chan branch from 76b3a99 to 60653bb Compare December 11, 2024 16:40
lightning/src/ln/channelmanager.rs Show resolved Hide resolved
}
if let Some((_, funding_txo, _, update)) = $shutdown_res_mut.monitor_update.take() {
handle_new_monitor_update!($self, funding_txo, update, $peer_state,
$channel_context, REMAIN_LOCKED_UPDATE_ACTIONS_PROCESSED_LATER);
}
// If there's a possibility that we need to generate further monitor updates for this
// channel, we need to store the last update_id of it. However, we don't want to insert
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~10 lines down, it looks like we're setting the update id in closed_channel_monitor_update_ids to the update_id prior to the below background event update_id, but it seems like after 6489550 the update_id in said map should no longer ignore pending background events?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grrr, good catch.

@@ -7237,37 +7237,42 @@ where

let mut preimage_update = ChannelMonitorUpdate {
update_id: 0, // set in set_closed_chan_next_monitor_update_id
counterparty_node_id: prev_hop.counterparty_node_id,
counterparty_node_id: None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly spurious - I think when moving this code around for the 10th time at some time before then I didn't have the counterparty id readily available so just removed it since it doesn't matter much (its just used to keep the ChannelMonitor's counterparty_node_id field updated if the ChannelMonitor was build before that field was added). I went ahead and re-added it thought.

lightning/src/ln/chanmon_update_fail_tests.rs Show resolved Hide resolved
Comment on lines +13646 to +13643
let in_flight_updates = per_peer_state.in_flight_monitor_updates
.entry(*funding_txo)
.or_insert_with(Vec::new);
debug_assert!(!in_flight_updates.iter().any(|upd| upd == update));
in_flight_updates.push(update.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing test coverage for this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what is missing test coverage here? If I add a panic here a ton of tests fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I had commented the lines out. Probably all tests passed because updates are duplicated in pending background events?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, its duplicated there and handle_new_monitor_update ultimately happily treats it as an actually-new update and inserts it into the queue. I spent a bit trying to figure out how to test this but I don't actually think its externally visible.

Comment on lines +13646 to +13643
let in_flight_updates = per_peer_state.in_flight_monitor_updates
.entry(*funding_txo)
.or_insert_with(Vec::new);
debug_assert!(!in_flight_updates.iter().any(|upd| upd == update));
in_flight_updates.push(update.clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to make this part of the DRYing commit?

@TheBlueMatt TheBlueMatt force-pushed the 2024-09-async-persist-claiming-from-closed-chan branch from 60653bb to c199059 Compare December 11, 2024 21:14
Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM, I'm good with a squash

lightning/src/ln/channelmanager.rs Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
lightning/src/ln/channelmanager.rs Show resolved Hide resolved
@TheBlueMatt TheBlueMatt force-pushed the 2024-09-async-persist-claiming-from-closed-chan branch from c199059 to cd65840 Compare December 12, 2024 19:57
@TheBlueMatt
Copy link
Collaborator Author

Squashed:

$ git diff-tree -U1 c19905929 cd658400c
diff --git a/lightning/src/ln/channelmanager.rs b/lightning/src/ln/channelmanager.rs
index cfd5de805..58a04a3e5 100644
--- a/lightning/src/ln/channelmanager.rs
+++ b/lightning/src/ln/channelmanager.rs
@@ -13620,3 +13620,3 @@ where
 				}
-				let mut per_peer_state =  per_peer_state.get(counterparty_node_id)
+				let mut per_peer_state = per_peer_state.get(counterparty_node_id)
 					.expect("If we have pending updates for a channel it must have an entry")

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
Comment on lines 7232 to 7237
// Note that the below is race-y - we set the `update_id` here and then drop the peer_state
// lock before applying the update in `apply_post_close_monitor_update` (or via the
// background events pipeline). During that time, some other update could be created and
// then applied, resultin in `ChannelMonitorUpdate`s being applied out of order and causing
// a panic.
Self::set_closed_chan_next_monitor_update_id(&mut *peer_state, prev_hop.channel_id, &mut preimage_update);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this was suppose to be applied to 94d0735?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this fixup (now 7ce4e99) is still applied to the wrong commit?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, not sure how I did that. The commit applies cleanly one up.

lightning/src/ln/channelmanager.rs Outdated Show resolved Hide resolved
Now that we track the latest `ChannelMonitorUpdate::update_id` for
each closed channel in
`PeerState::closed_channel_monitor_update_ids`, we should always
have a `PeerState` entry for the channel counterparty any time we
go to claim an HTLC on a channel, even if its closed.

Here we make this a hard assertion as we'll need to access that
`PeerState` in the coming commits to track in-flight updates
against closed channels.
In c99d3d7 we added a new
`apply_post_close_monitor_update` method which takes a
`ChannelMonitorUpdate` (possibly) for a channel which has been
closed, sets the `update_id` to the right value to keep our updates
well-ordered, and then applies it.

Setting the `update_id` at application time here is fine - updates
don't really have an order after the channel has been closed, they
can be applied in any order - and was done for practical reasons
as calculating the right `update_id` at generation time takes a
bit more work on startup, and was impossible without new
assumptions during claim.

In the previous commit we added exactly the new assumption we need
at claiming (as it's required for the next few commits anyway), so
now the only thing stopping us is the extra complexity.

In the coming commits, we'll move to tracking post-close
`ChannelMonitorUpdate`s as in-flight like any other updates, which
requires having an `update_id` at generation-time so that we know
what updates are still in-flight.

Thus, we go ahead and eat the complexity here, creating
`update_id`s when the `ChannelMonitorUpdate`s are generated for
closed-channel updates, like we do for channels which are still
live.

We also ensure that we always insert `ChannelMonitorUpdate`s in the
pending updates set when we push the background event, avoiding a
race where we push an update as a background event, then while its
processing another update finishes and the post-update actions get
run.
@TheBlueMatt TheBlueMatt force-pushed the 2024-09-async-persist-claiming-from-closed-chan branch from cd65840 to 8be0b8e Compare December 13, 2024 16:52
In d1c340a we added support in
`handle_new_monitor_update!` for handling updates without dropping
locks.

In the coming commits we'll start handling `ChannelMonitorUpdate`s
"like normal" for updates against closed channels. Here we set up
the first step by adding a new `POST_CHANNEL_CLOSE` variant on
`handle_new_monitor_update!` which attempts to handle the
`ChannelMonitorUpdate` and handles completion actions if it
finishes immediately, just like the pre-close variant.
One of the largest gaps in our async persistence functionality has
been preimage (claim) updates to closed channels. Here we finally
implement support for this (for updates at runtime).

Thanks to all the work we've built up over the past many commits,
this is a well-contained patch within `claim_mpp_part`, pushing
the generated `ChannelMonitorUpdate`s through the same pipeline we
use for open channels.

Sadly we can't use the `handle_new_monitor_update` macro wholesale
as it handles the `Channel` resumption as well which we don't do
here.
On startup, we walk the preimages and payment HTLC sets on all our
`ChannelMonitor`s, re-claiming all payments which we recently
claimed. This ensures all HTLCs in any claimed payments are claimed
across all channels.

In doing so, we expect to see the same payment multiple times,
after all it may have been received as multiple HTLCs across
multiple channels. In such cases, there's no reason to redundantly
claim the same set of HTLCs again and again. In the current code,
doing so may lead to redundant `PaymentClaimed` events, and in a
coming commit will instead cause an assertion failure.
One of the largest gaps in our async persistence functionality has
been preimage (claim) updates to closed channels. Here we finally
implement support for this (for updates which are generated during
startup).

Thanks to all the work we've built up over the past many commits,
this is a fairly straightforward patch, removing the
immediate-completion logic from `claim_mpp_part` and adding the
required in-flight tracking logic to
`apply_post_close_monitor_update`.

Like in the during-runtime case in the previous commit, we sadly
can't use the `handle_new_monitor_update` macro wholesale as it
handles the `Channel` resumption as well which we don't do here.
This moves the common `if during_startup { push background event }
else { apply ChannelMonitorUpdate }` pattern by simply inlining it
in `handle_new_monitor_update`.
@TheBlueMatt TheBlueMatt force-pushed the 2024-09-async-persist-claiming-from-closed-chan branch from 8be0b8e to 5d5971c Compare December 13, 2024 17:30
Copy link
Contributor

@jkczyz jkczyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Feel free to squash, IMO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants